Fix firstOnly selection behavior #152

jrom99 · 2024-09-12T18:18:43Z

firstOnly used to select one match per combination (object+chain+segi), so if one object had multiple chains, each chain would match once.

This makes it so that firstOnly will only match one time per object, on the first segi+chain available (in alphabetical order).

`firstOnly` used to select one match per combination (object+chain+segi), so if one object had multiple chains, each chain would match once. This makes it so that `firstOnly` will only match one time per object, on the first segi+chain available (in alphabetical order).

jrom99 · 2024-09-12T18:21:51Z

Another issue I've noticed is that for objects which have residue sequence data available, but the residues don't have structural information (like in loops), this script is unable to find the sequence to select.

But I'm not sure how to update the selection behavior to fix it.

pslacerda · 2024-09-12T19:26:17Z

I'll check the firstOnly issue soon. Can you dump here some script that check findseq use case?

And about your missing residue issue, I have an idea that seems to work. For sure the sequence data is available in RCSB PDB and mmCIF files but may be missing when they are from other sources, I don't know when it is the case.

The API only cmd.get_fastastr is a command related to findseq, may you see if it works in your case? In my case the FASTA string is retrieved complete but iterate and cmd.get_model aren't, and I don't know why.

https://github.com/schrodinger/pymol-open-source/blob/9d3061ca58d8b69d7dad74a68fc13fe81af0ff8e/modules/pymol/exporting.py#L169

pslacerda · 2024-09-12T21:53:43Z

The ONE_LETTER table has some errors like the map 'CRF':'TWG' which will ruin the analysis, in case of matching. There are also cases like 'A ':'A', which are not useful.

Edit: I checked some values in ONE_LETTER and I don't trust it.

pslacerda · 2025-05-01T03:35:38Z

I reverted your commits as I tested it a few hours ago and they weren't working. I shouldn't had merged it without testing.

You can recover your commits by this PR branch, if you need. Take your time...

Remove dependency on hardcoded ONE_LETTER dictionary

jrom99 · 2025-06-19T18:05:19Z

cmd.get_fastastr gave the wrong fasta string for my test file (1qys), returning a sequence with "?" instead of "M". However, cmd.iterate was able to retrieve those residues without structural information, so now we can find them as well. I took the liberty of renaming the firstOnly to matchMode to reflect the behavior for multiple objects.

It worked as expected on my files, but I'd need your files and tests to check if it is working as expected.

pslacerda · 2025-06-19T22:25:16Z

Note that the residues without structural information can be retrieved only if using a file with appropriate metadata. In custom made files it will skip missing residues, I guess. If pertinent, it should be handled by code or explicitly stated at documentation.

jrom99 · 2025-06-20T00:53:28Z

Note that the residues without structural information can be retrieved only if using a file with appropriate metadata. In custom made files it will skip missing residues, I guess. If pertinent, it should be handled by code or explicitly stated at documentation.

Do you have one of these so I can test the code?

pslacerda · 2025-06-20T09:50:53Z

I tested your branch patch-1, and it's working but conflicting. Can you force push or something?

jrom99 · 2025-06-20T15:19:16Z

I don't know how to force push, so I edited the conflict thing.

Another thing that I don't know if we should document on the help function is that this code (and the original version as well) only search for non-overlapping matches. In a future update, how about we add the option to create a group and put each match into its own selection, which would allow for overlapping matches?

pslacerda · 2025-06-20T22:56:24Z

I don't understand this regex in the examples. It is multi-character? How it works?

        # find the Potential N-linked glycosylation sites in 5fyj
        fetch 5fyj
        findseq N(?=[^P][ST]), 5fyj and chain G+B, 5fyj_pngs

jrom99 · 2025-06-21T03:02:39Z

I don't understand this regex in the examples. It is multi-character? How it works?

        # find the Potential N-linked glycosylation sites in 5fyj
        fetch 5fyj
        findseq N(?=[^P][ST]), 5fyj and chain G+B, 5fyj_pngs

It looks for a single character. This regex is using a lookahead assertion to match only N's that are followed by XY where X is not P and Y can be either S or T.

From wikipedia:

On the other hand, the attachment of a glycan residue to a protein requires the recognition of a consensus sequence. N-linked glycans are almost always attached to the nitrogen atom of an asparagine (Asn) side chain that is present as a part of Asn–X–Ser/Thr consensus sequence, where X is any amino acid except proline (Pro).[4]

pslacerda force-pushed the master branch from 29385cf to c7763bd Compare February 7, 2025 06:41

speleo3 mentioned this pull request Jun 19, 2025

Update findseq for multiple objects #161

Open

Use iterate instead of get_model

0497112

Remove dependency on hardcoded ONE_LETTER dictionary

Merge branch 'master' into patch-1

86667a5

pslacerda merged commit 0251a56 into Pymol-Scripts:master Jun 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix firstOnly selection behavior #152

Fix firstOnly selection behavior #152

jrom99 commented Sep 12, 2024

Uh oh!

jrom99 commented Sep 12, 2024

Uh oh!

pslacerda commented Sep 12, 2024

Uh oh!

pslacerda commented Sep 12, 2024 •

edited

Loading

Uh oh!

pslacerda commented May 1, 2025

Uh oh!

jrom99 commented Jun 19, 2025

Uh oh!

pslacerda commented Jun 19, 2025 •

edited

Loading

Uh oh!

jrom99 commented Jun 20, 2025

Uh oh!

pslacerda commented Jun 20, 2025

Uh oh!

jrom99 commented Jun 20, 2025 •

edited

Loading

Uh oh!

pslacerda commented Jun 20, 2025

Uh oh!

jrom99 commented Jun 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Fix firstOnly selection behavior #152

Fix firstOnly selection behavior #152

Conversation

jrom99 commented Sep 12, 2024

Uh oh!

jrom99 commented Sep 12, 2024

Uh oh!

pslacerda commented Sep 12, 2024

Uh oh!

pslacerda commented Sep 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pslacerda commented May 1, 2025

Uh oh!

jrom99 commented Jun 19, 2025

Uh oh!

pslacerda commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrom99 commented Jun 20, 2025

Uh oh!

pslacerda commented Jun 20, 2025

Uh oh!

jrom99 commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pslacerda commented Jun 20, 2025

Uh oh!

jrom99 commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pslacerda commented Sep 12, 2024 •

edited

Loading

pslacerda commented Jun 19, 2025 •

edited

Loading

jrom99 commented Jun 20, 2025 •

edited

Loading

jrom99 commented Jun 21, 2025 •

edited

Loading